Empirical data on corpus design and usage in biomedical natural language processing

نویسندگان

  • K. Bretonnel Cohen
  • Philip V. Ogren
  • Lynne M. Fox
  • Lawrence Hunter
چکیده

This paper describes the design of six publicly available biomedical corpora. We then present usage data for the six corpora. We show that corpora that are carefully annotated with respect to structural and linguistic characteristics and that are distributed in standard formats are more widely used than corpora that are not. These findings have implications for the design of the next generation of biomedical corpora.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Corpus Design For Biomedical Natural Language Processing

This paper classifies six publicly available biomedical corpora according to various corpus design features and characteristics. We then present usage data for the six corpora. We show that corpora that are carefully annotated with respect to structural and linguistic characteristics and that are distributed in standard formats are more widely used than corpora that are not. These findings have...

متن کامل

Corpus based coreference resolution for Farsi text

"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...

متن کامل

Ultra-Fast Image Reconstruction of Tomosynthesis Mammography Using GPU

Digital Breast Tomosynthesis (DBT) is a technology that creates three dimensional (3D) images of breast tissue. Tomosynthesis mammography detects lesions that are not detectable with other imaging systems. If image reconstruction time is in the order of seconds, we can use Tomosynthesis systems to perform Tomosynthesis-guided Interventional procedures. This research has been designed to study u...

متن کامل

Development of a Greek biomedical corpus

Collection and annotation of specialized corpora, for less-spoken languages such as Greek, is crucial endeavour for the development and growth of the language technology research for these languages. This paper presents the design and compilation of a biomedical corpus that took place in the framework of the national R&D project “IATROLEXI” (http://www.iatrolexi.gr). The aim of IATROLEXI is to ...

متن کامل

Bootstrapping a Romanian Corpus for Medical Named Entity Recognition

Named Entity Recognition (NER) is an important component of natural language processing (NLP), with applicability in the biomedical domain, enabling knowledge discovery from medical texts. Due to the fact that for the Romanian language there are only a few linguistic resources specific to the biomedical domain, we have created a sub-corpus specific to this domain. In this paper we present a new...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • AMIA ... Annual Symposium proceedings. AMIA Symposium

دوره   شماره 

صفحات  -

تاریخ انتشار 2005